Italiano English

What people want (to know)

Pietro Gravino 1, Giulio Prevedello 1, Annette Werth 1, Vittorio Loreto 1,2,3
1 SONY Computer Science Lab - Pari
2 Sapienza - University of Rome
3 Complexity Science Hub Vienna

Handling a critical and unprecedented emergency in the Information Age always involves crucially the Information Dynamics layer of the problem. In the case of the COVID-19 emergency, understanding multiple aspects of information dynamics has become crucial. What are the informational needs of a population confronted with an emergency situation? How are the needs evolving with time? How are they influenced by institutional interventions? How are these informational needs spreading geographically? These are just some of the initial questions our team is tackling.

In this preliminary and still ongoing study, we introduce the first metric we will use to address this challenge and to visualize their evolution in time and space as well as from a semantic point of view. The main metric we introduce is the Searches index which gives us insights about the popularity of top search queries in Google Search across various regions and languages. This index is obtained from data that can be extracted from Google Trends. These data are proportional to the amount of queries done by Google search engine users.

We chose Italy as a case study and started by downloading data relative to the searches for the term "coronavirus" in Italy since January the 1st, 2020. We used as reference the nationwide searches normalized on a scale form 0 to 100 ( maximum peak is reached on February the 23rd, 2020). All the other regional indices are rescaled according their relative importance compared to the national result. Furthermore, we also looked at related queries, what words people searched together with the term "coronavirus". We consider this search index as a reasonable proxy for the informational need of the Italian population during the COVID emergency.

DA TRADURRE: What people want (to know)

Pietro Gravino 1, Giulio Prevedello 1, Annette Werth 1, Vittorio Loreto 1,2,3
1 SONY Computer Science Lab - Pari
2 Sapienza - University of Rome
3 Complexity Science Hub Vienna

Handling a critical and unprecedented emergency in the Information Age always involves crucially the Information Dynamics layer of the problem. In the case of the COVID-19 emergency, understanding multiple aspects of information dynamics has become crucial. What are the informational needs of a population confronted with an emergency situation? How are the needs evolving with time? How are they influenced by institutional interventions? How are these informational needs spreading geographically? These are just some of the initial questions our team is tackling.

In this preliminary and still ongoing study, we introduce the first metric we will use to address this challenge and to visualize their evolution in time and space as well as from a semantic point of view. The main metric we introduce is the Searches index which gives us insights about the popularity of top search queries in Google Search across various regions and languages. This index is obtained from data that can be extracted from Google Trends. These data are proportional to the amount of queries done by Google search engine users.

We chose Italy as a case study and started by downloading data relative to the searches for the term "coronavirus" in Italy since January the 1st, 2020. We used as reference the nationwide searches normalized on a scale form 0 to 100 ( maximum peak is reached on February the 23rd, 2020). All the other regional indices are rescaled according their relative importance compared to the national result. Furthermore, we also looked at related queries, what words people searched together with the term "coronavirus". We consider this search index as a reasonable proxy for the informational need of the Italian population during the COVID emergency.


Searches Evolution over Time

In the previous graph, we show the evolution in time of the Searches index, nationwide and for each region. The value has to be interpreted as an average, so it does not depend on the regional population. By clicking on the region’s name in the legend, you can show or hide the index’s evolution for that region. Also, by clicking on the intervention dropdown list, you can overlay the timelines of the various types of interventions taken by the government.

DA TRADURRE: Searches Evolution over Time

In the following graph, we show the evolution in time of the Searches index, nationwide and for each region. The value has to be interpreted as an average, so it does not depend on the regional population. By clicking on the region’s name in the legend, you can show or hide the index’s evolution for that region. Also, by clicking on the intervention dropdown list, you can overlay the timelines of the various types of interventions taken by the government.

There are several interesting features emerging from this simple visualization which will need further investigation to be fully understood. Schematically, these are some possible interpretations:

  • Despite the first interventions dating back to the early days of January, the ignition of the interest seem to happen when the first intervention (Airport Restriction for travelers coming from China, on January the 24th);
  • The Searches index kept on rising until a first peak on January the 31st, when the italian authorities declared the State of Emergency. Interestingly enough, from that moment on, the interest seems to decrease rapidly to stabilize on a relatively low value.
  • The real spike of the interest occurred in the crucial days of the epidemic in Italy, which was between the 21st and 23rd of February. The Searches index reached its maximum when the government announced to lock down several cities in northern Italy.
  • The decrease after the maximum peak is almost as fast as the increase, and it is probably due to news and informational campaigns (like institutional communication on national TV), but this time the decrease stabilizes at a very high value, compared to the previous history.
  • The trend inversion around the first days of March is probably linked to the national closure of all educational institutions. The index starts to grow again reaching a new maximum on the day of national lockdown (March the 9th). From that moment onward, we observe a slow but steady decrease of the interest for the topic.
Next we wondered how these phenomena spread through the county, so we wanted to explore the differences between the Italian regions.

There are several interesting features emerging from this simple visualization which will need further investigation to be fully understood. Schematically, these are some possible interpretations:

  • Despite the first interventions dating back to the early days of January, the ignition of the interest seem to happen when the first intervention (Airport Restriction for travelers coming from China, on January the 24th);
  • The Searches index kept on rising until a first peak on January the 31st, when the italian authorities declared the State of Emergency. Interestingly enough, from that moment on, the interest seems to decrease rapidly to stabilize on a relatively low value.
  • The real spike of the interest occurred in the crucial days of the epidemic in Italy, which was between the 21st and 23rd of February. The Searches index reached its maximum when the government announced to lock down several cities in northern Italy.
  • The decrease after the maximum peak is almost as fast as the increase, and it is probably due to news and informational campaigns (like institutional communication on national TV), but this time the decrease stabilizes at a very high value, compared to the previous history.
  • The trend inversion around the first days of March is probably linked to the national closure of all educational institutions. The index starts to grow again reaching a new maximum on the day of national lockdown (March the 9th). From that moment onward, we observe a slow but steady decrease of the interest for the topic.
Next we wondered how these phenomena spread through the county, so we wanted to explore the differences between the Italian regions.


The Geographical Evolution of Searches

It is also interesting to see how the google searches of `Coronavirus` spread throughout the country. The virus first surged in Lombardy which resulted in earlier and stricter measures than the rest of Italy. Indeed, we assumed that despite being with the same nation and subject to national TV and measures, the geographical presence of the COVID crisis has a huge impact on people’s search attitudes. The following animation shows precisely the regional searches over time.

DA TRADURRE: The Geographical Evolution of Searches

It is also interesting to see how the google searches of `Coronavirus` spread throughout the country. The virus first surged in Lombardy which resulted in earlier and stricter measures than the rest of Italy. Indeed, we assumed that despite being with the same nation and subject to national TV and measures, the geographical presence of the COVID crisis has a huge impact on people’s search attitudes. The following animation shows precisely the regional searches over time.

As expected, we observe that the initial spikes in Lombardia spread first to the neighbouring regions before reaching the entire Italy within days. This indicates that the geographic closeness of both measures and news highly impacts the search patterns. By the time of the national lockdown, a relatively uniform search pattern is observed. As already in the previous graph, we see the close relation of the search interest in respect to government measures which we believe to be closer than with actual epidemic data.

The next question we wondered is what about “coronavirus” do people want to know most, i.e. what terms do they search for together?

As expected, we observe that the initial spikes in Lombardia spread first to the neighbouring regions before reaching the entire Italy within days. This indicates that the geographic closeness of both measures and news highly impacts the search patterns. By the time of the national lockdown, a relatively uniform search pattern is observed. As already in the previous graph, we see the close relation of the search interest in respect to government measures which we believe to be closer than with actual epidemic data.

The next question we wondered is what about “coronavirus” do people want to know most, i.e. what terms do they search for together?


Ranks of Related Searches

As the previous panel showed regional differences, we seek to understand whether these may correspond to different information needs associated with the term "Coronavirus". In the following figure, we investigate the searches of coronavirus-related queries, at national and regional level, and how their weekly ranking evolves over time. Note that Google Trends’ data were preprocessed to merge different keywords pointing at a semantically same query.

DA TRADURRE: Ranks of Related Searches

As the previous panel showed regional differences, we seek to understand whether these may correspond to different information needs associated with the term "Coronavirus". In the following figure, we investigate the searches of coronavirus-related queries, at national and regional level, and how their weekly ranking evolves over time. Note that Google Trends’ data were preprocessed to merge different keywords pointing at a semantically same query.

We observe that the top searched queries in a region correspond to the words "News", "Italia" (i.e. Italy), the same region's name, and its main cities, suggesting that people seek information about the epidemic spread near where they live. The same pattern emerges at the national level, with only the most populated cities appearing among the top queries. While sharing the main trend, some searches associated with coronavirus highlight differences between regions. For example, the word "Cina" (i.e. China) was more searched in Veneto and Lazio than Campania, and so "Maps" was not queried in Campania or Puglia as much as in Veneto and Lombardia. Interestingly, "Autocertificazione" (i.e. self-certification) was highly searched in regions such as Veneto or Toscana, compared to Lombardia, Lazio and Campania.

We observe that the top searched queries in a region correspond to the words "News", "Italia" (i.e. Italy), the same region's name, and its main cities, suggesting that people seek information about the epidemic spread near where they live. The same pattern emerges at the national level, with only the most populated cities appearing among the top queries. While sharing the main trend, some searches associated with coronavirus highlight differences between regions. For example, the word "Cina" (i.e. China) was more searched in Veneto and Lazio than Campania, and so "Maps" was not queried in Campania or Puglia as much as in Veneto and Lombardia. Interestingly, "Autocertificazione" (i.e. self-certification) was highly searched in regions such as Veneto or Toscana, compared to Lombardia, Lazio and Campania.


Conclusion

These preliminary analyses have left us with many possible interpretations but also a lot of questions. What is the typical dynamics of the Searches index? Unless the same query results in different output, each search is unlikely to be repeated by the same user, so a physiologic decrease is expected, but not characterized yet. And what are the influences of the different kinds of interventions on the dynamics? Some interventions seem to be linked with the rise of the interest while others seem to cause a lowering.

Of course, the interest evolution needs to be compared with the information offered by the national news outlets. How does the availability of information from national or regional news outlets influence the need for information that people try to obtain from the internet?

We have also observed a strong geographical dependence on the regional and national measures taken, the sense of “closeness” influences the people’s search patterns. This leads to the question whether is it possible to profile regions upon their information needs?

Another interesting question could be: what difference in search patterns can we observe when comparing different nations? How did the governmental intervention impact the searches there?

Credits:
Searches data have been gathered from Google Trends throught the pytrends Python library;
Interventions data come from this repository, by Complexity Science Hub Vienna;

DA TRADURRE: Conclusion

These preliminary analyses have left us with many possible interpretations but also a lot of questions. What is the typical dynamics of the Searches index? Unless the same query results in different output, each search is unlikely to be repeated by the same user, so a physiologic decrease is expected, but not characterized yet. And what are the influences of the different kinds of interventions on the dynamics? Some interventions seem to be linked with the rise of the interest while others seem to cause a lowering.

Of course, the interest evolution needs to be compared with the information offered by the national news outlets. How does the availability of information from national or regional news outlets influence the need for information that people try to obtain from the internet?

We have also observed a strong geographical dependence on the regional and national measures taken, the sense of “closeness” influences the people’s search patterns. This leads to the question whether is it possible to profile regions upon their information needs?

Another interesting question could be: what difference in search patterns can we observe when comparing different nations? How did the governmental intervention impact the searches there?

Credits:
Searches data have been gathered from Google Trends throught the pytrends Python library;
Interventions data come from this repository, by Complexity Science Hub Vienna;